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1 .  INTRODUCTION . 


1.1  The  regression  model  in  goodness-of-fit. 

Suppose  a  random  sample  comes  from  distribution 

F  (x)  and  let  X,, , ,X, ,X,  .  be  the  order  statistics.  F  (x)  may 
o  (1)  (2)  '  (n)  o 

be  of  the  form  F (w)  with  w  =  (x-a) /3;  ct  is  then  the  location  parameter 

and  3  is  the  scale  parameter  of  f^(x) .  There  may  be  other  parameters 

in  F(w),  for  example,  a  shape  parameter;  here  we  assume  such  parameters 

known,  but  a  and  3  are  unknown.  We  can  suppose  the  random  sample  of 

X-values  to  have  been  constructed  from  a  random  sample  w. ,w_,...,w  from 

12  n 

F(w),  by  the  transformation 

X.  =  a  +  3w.  .  (1) 

1  1 

If  the  order  statistics  of  the  w-sample  are  '^(2)  ^  ^  '^(n)  ' 

we  have  also 


X, . ,  =  a  +  3w, . ,  ,  (2) 

(i)  (i) 


Let 

be  m. 

1 

and  let 

v.  . 
13 

be  E(w, . .  - 
(j.) 

m. ) (w, . ,  -  m. ) ; 

1  (3)  3 

let 

V  be  the 

n  X  n 

matrix 

with 

entries  v. . 

13 

V  is  the  covariance 

matrix  of  the  order  statistics  w,.,.  From  (2)  we  have 

(i) 

E(X, . , )  =  a  +  3ra.  (3) 

(i)  1 

and  a  plot  of  against  m^^^  should  be  approximately  a  straight  line 

with  intercept  a  on  the  vertical  axis  and  slope  3  .  The  values  m^  are 
the  aiost  natural  values  to  plot  along  the  horizontal  axis  to  achieve  a 
straight  line  plot,  but  for  most  distributions  they  are  difficult  to  calculate. 


2. 


Various  authors  have  therefore  proposed  alternatives  which  are  con- 

venient  functions  of  i  ;  then  (2)  can  be  replaced  by  the  model 


X, . ,  =  a  +  6t.  + 

(i)  11 


(4) 


where  €.  is  an  "error "which  has  mean  zero  only  for  T.  =  m. . 

1  11 

A  common  choice  for  T.  is  H.  ^  F  ^{i/{n+l)}  or  similar 

^  1 

expressions  which  approximate  m^  .  A  test  of 


;  the  X-sample  comes  from  F  (ic)  , 
0  o 


(5) 


can  then  be  based  on  how  well  the  data  fits  the  line  (3)  or  (4) 


1.2  Example.  As  an  example,  suppose  it  is  desired  to  test  that  the 

X-sample  is  normally  distributed,  with  unknown  mean  y  and  variance  a 

w 

1  f  -t^/2 

Then  F  (w)  =  ®  dt  ,  and  the  w-sample  is  standard  normal. 

00 

Then  (1)  becomes 


X^  =  y  +  Ow^ 

and  (3)  is 


E(X, . , )  =  y  +  am. 

(i)  1 

where  m^  are  the  expected  values  of  standard  normal  order  statistics. 
For  this  distribution,  a  =  y  and  B  =  o  . 


3. 


1.3  Measures  of  fit. 


The  practice  of  plotting  the  against  (or  against 

another  set  of  constants  T^  which  approximate  the  m^-values) 
and  looking  to  see  if  a  straight  line  results,  is  time-honored  as  a 
quick  technique  for  testing  normality.  An  improvement  on  this  procedure 
by  eye,  is  to  measiire  how  well  the  data  fits  the  line  (3).  Three  main 
approaches  to  measuring  the  fit  can  be  identified.  The  first  is  simply 
to  measure  the  correlation  coefficient  R(X,T)  between  the  paired  sets 
X.  and  T.  .  A  second  method  is  to  estimate  the  line  a  +  3t.  ,  using 

1  i  1 

generalized  least  squares  to  take  into  account  the  covariance  of  the  order 
statistics,  and  then  to  base  the  test  of  fit  on  the  sum  of  squares  of 
residuals.  Finally,  a  third  technique  is  to  estimate  3  from  (2)  using 
generalized  least  squares,  and  to  compare  this  estimate  with  the  estimate 
of  scale  given  by  the  sample  standard  deviation.  In  this  article  we 
explore  the  first  two  of  these  methods,  which  are  often  closely  connected. 


1.4  The  correlation  coefficient. 


The  simplest  of  the  three  methods  above  is  to  use  the  correlation 

co-efficient  R(X,T) .  Here  we  extend  the  usual  meaning  of  correlation, 

and  also  that  of  variance  and  covariance,  to  apply  to  constants  as  well  as 

random  variables.  Thus  let  X  refer  to  the  vector  X,,,,...,X,  ,,  and  T 

(1)  '  (n)  ' 

T.X  Et 

„  ^  n  (i)  ,  -  i  ,  (all  sums  are  for 

to  vector  T_,...,T  ;  let  X  =  -  and  T  =  — — 

I  n  n  n 


i  =  1  to  n)  and  define  the  sums 


4. 


S(X,T)  =  5^(X,.,  -  X)  (T.  -  T)  =  EX,..T.  -  nXT  ; 

(i)  1  (i)  1 

S(X,X)  =  E{X,.,  -  X)^  =  E(X.  -  X)^; 

(i)  1 

_  2 

S(T,T)  =  Z(T^  -  T) 

2 

S(X,X)  will  often  be  called  S 

The  variance  of  X  is  then  V{X,Xll  =  S(y.,X),  the  variance  oi 

T  is  V(T,T)  =  — ^  S(T,T)  ,  and  the  covariance  of  X  and  T  is 
n-1 

Y(X,T)  =  — S(X,T).  The  correlation  coefficient  between  X  and  T  is 


R(X,T)  = 


V(X,T) 


S(1C,T) 


{V(X,X)V(T,T)  {S(1{,X)S{.T,T)  }' 


Statistics  R(X,m)  (called  sometimes  R)  or  R  (X,m)  are  attractive  statistics  for 


testing  the  fit  of  X  to  the  model  (2) ,  since  if  a  "perfect"  sample  is 

given,  that  is,  a  sample  whose  ordered  values  fall  exactly  at  their 

expected  values,  R(X,m)  will  be  1  ,  and  the  value  of  R(X,m)  can  be 

interpreted  as  a  measure  of  how  closely  the  sample  resembles  a  perfect 

2 

sample.  Then  tests  based  on  R(X,m) ,  or  equivalently  on  R  (X,m)  will 
be  one-tailed;  rejection  of  occurs  only  for  low  values  of  R  . 

Suppose  ^(j_)  =  0(  +  Bt^,  where  a  and  3  are  the  usual  regression 
estimators  of  a  and  3  (ignoring  the  covariance  between  the 
It  is  possible  to  set  up  the  standard  ANOVA  table  for  straight  line 
regression ; 


5, 


Regression  SS 


S^(X,T) 

S(T,T) 


Error 


(X,T) 

S(T,T) 


E(X^i) 


2 


Total 


S(X,X) 


and  it  is  clear  that 


Error  SS 
Total  SS 


R^(X,T)  . 


Define,  for  any  T  vector, 

Z(X,T)  =  n{l  -  r^(X,T)}. 

2 

Then  Z(X,T)  is  a  test  statistic  equivalent  to  R  (X,T) ,  based  on  the  sum 
of  squares  of  the  residuals  after  the  line  (3)  has  been  fitted.  Z(X,T) 
has,  in  common  with  many  other  goodness-of-fit  statistics  e.g.,  chi-square, 
and  EDF  statistics,  the  property  that  the  larger  Z(X,T)  is,  the  worse  the 
fit.  Sarkadi  [1975] and  more  recently  Gerlach  [1979] have  shown  consistency 
for  correlation  tests  based  on  R(X,m),  or  equivalently  Z(X,m) ,  for  a  wide 
class  of  distributions  including  all  the  usual  continuous  distributions. 

This  is  to  be  expected,  since  for  large  n  we  expect  a  sample  to  become 
perfect  in  the  sense  above.  We  can  expect  the  consistency  property  to 
fcxLend  to  R(X,T)  provided  T  approaches  m  sufficiently  rapidly  for 
large  samples. 


1.5  Censored  data.  R  (X,T)  can  easily  be  calculated  for  censored 
data,  provided  the  ranks  of  the  available  ^(i)  known.  These 


6 


2 

are  paired  with  the  appropriate  and  R  (X,T)  is  calculated  using  the 

same  formula  as  above,  wi^Ji  the  sums  running  over  the  known  i  . 

For  example  if  the  data  were  right  censored,  so  that  only  the  r 

smallest  values  X,..  were  available,  the  sums  would  run  for  i  frcm 

(x) 

1  to  r;  if  the  data  were  left-censored,  with  the  first  s  values  missing, 
the  i  would  run  from  s+1  to  n  .  Tables  of  Z(X,T)  for  T  =  m  or 
H  ,  for  testing  for  the  uniform,  normal,  exponential,  logistic,  or 
extreme-value  distributions  have  been  published  by  Stephens  (1986). 


2.  CORRELATION  TESTS  FOR  THE  UNIFORM  DISTRIBUTION. 


For  the  uniform  distribution  for  X  ,  between  limits  (a,b) , 

written  U(a,b)  ,  we  have  F  (w)  =  w,  0  <  w  <  1,  and  X^  =  a  +  (b  a)W^  ; 

hence  a  =  a  ,  3  =  b  -  a.  Then  ra.  =  E{w,..)  =  i/(m+l) :  also  H.  =  m.  . 

t  (i)  '  11 

The  order  statistics  could  be  plotted  against  i  instead  of 

against  i/(n+L);  the  scale  factor  l/(n+l)  does  not  change  the  correlation 
coefficient,  and  R(X,m)  =  R(X,H)  =  R(X,T)  where  T^  =  i  . 


In  discussing  tests  for  the  uniform  distribution,  we  distinguish 


four  cases ; 

Case  0;  a,b  both  known; 

Case  1;  a  unknown,  but  (b-a)  known; 
Case  2;  a  known,  (b-a)  unknown; 

Case  3;  both  a  and  b  unknown. 


Case  0.  Here  a  and  b  are  both  known,  so  that  a  and 
in  (1).  The  transformation  X'  =  (X-a)/(b-a)  then  reduces 


3  are  known 
the  problem  to 


7. 


a  test  that  X'  is  U(0,1).  There  are  of  course  many  tests  for  this 
special  case  (see,  eg.,  Stephens,  1986).  In  the  present  context,  the  test 
will  be  based  on  the  residuals  from  the  known  line  F(x')  =  x' ,  0  <  x'  <1; 
that  is,  on  the  statistic  “  i/(n+l)  }  .  It  is  clear  that 

has  the  same  asymptotic  distribution  as  the  well-known  Cramer-von  Mises 
statistic  W  =  ~  ^2i  -  l)/(2n)}  +  l/(12n)  ,  and,  for  small  samples, 

the  two  statistics  will  have  much  the  same  power  properties. 

Case  1.  Here  the  model  is  X,..  =  a  +  3w,.,,  with  B  =  b-a  known. 

-  (i)  (jl) 

Substitute  X'  =  X  /B  ;  then  the  model  becomes  X,'.,  =  a/B  +  N...  , 

(i)  (i)  (i)  (i) 

and  =  ct  +  (m^  -  m)  ,  where  a  =  a/B  +  m  ,  Ordinary  least 

A  A 

squares  gives  a  =  X'  .  Hence  x'  ,  =  X*  +  m.  -  0.5  and  the  test  statistic 

(i)  1 

I  ^  2 

based  on  residuals  is  Z,  =  21{x'  .  -  X'  -  (m.  -  0.5)}  .  Z,  has  similar 

1  (i)  1  1 

2 

properties  to  the  Watson  U  statistic 

U  =  ^^^(i)  ~  ^  “  {(2i-l)/(2n)  -  0.5}}'^  +  l/(12n)  ,  and  has  the  same 
asymptotic  distribution. 


Case  2.  For  Cases  2  and  3  the  situation  becomes  jaucn  harder,  and  considerable 

analysis  is  required  to  obtain  the  asymptotic  distributions  of  the  test 

'‘2  _  2 

statistic  ni:(x,.,  -  X,.,)  /E(X,.,  -  X)  ,  the  denominators  being  necessary, 

(i)  (i)  (i) 


and  complicating  the  analysis,  because  for  these  two  cases  the  scale  must  be 

estimated.  We  state  the  results  and  give  proofs  later.  For  Case  2,  the 

model  is  E(X,.,)  =  a  +  Bm.  ,  with  B  unknown  and  a  known-  Set  x',..  =  X,.,  -  a, 
(i)  1  (i)  (i)  ' 

so  that  "  Bm^,  and  estimate  B  by  least  squares;  then 

B  =  Ex'  .m./^m.  .  Thus  X,.,  =  a  +  Bm.,  and  the  test  statistic  is 

(i)  1  1  (i)  1 


8. 


'‘2  —2 

Z.  =  nZ{x,..  -  X,..}  /Z{x,..  -  x}  .  Z_  has  the  same  asymptotic  distribution 

2  (i)  (i)  (i)  2 

2 

as  Z*  =  where  ,  for  i  =  1,2,...,  are  independent 

2 

variables;  is  an  infinite  set  of  positive  weights  given  by  ~  ' 

where  6.  are  the  solutions  of  tan  0.  =6.,  B.  >0  (see  Section  3  below). 

1  111 

Case  3.  For  Case  3,  the  model  is  E(X,.,)  =  a  +  B(m.  -  0.5),  with  a, 6  unknown,  and  leas 
-  (i)  1 

A>  A  2 

squares  gives  a  =  X  and  3  =  ~  x}m^J/i.(m^  -  m)  .  The  test 

statistic  is  now  the  correlation  coefficient  R(X,m)  or  equivalently  Z-,  =  Z(x,m); 

2  ^2  2  ^  ^  ^ 

Z  =  n{l  -  R  (X,m)}  =  nE{x,.,  -  X,..}  /E{x,.,  -  x)  ,  where  X  =  a  +  6(m,  -  0.5). 

3  (i)  (j.)  (i)  (i)  1 

* 

Z^  has  the  same  asymptotic  distribution  as  Z^  =  Ev^/X^,  where, 

2 

as  above,  v^  are  independent  X]_  variables.  The  constants  are  positive 
weights  given  in  two  infinite  sets: 

,  2  2 

Set  1:  X.  =4TTi,i  =  l,2,...  . 

1 

2 

Set  2;  X  =  4({)  ,  k  =  1,2,...,  where  (J)  are  the  solutions  of  ^-an  4>.  =  , 

K  K  K  K  K 

\  "  0  • 

The  derivaticnof  the  weights  for  Cases  2  and  3  will  be  given  in  the 
next  section. 


3.  ASYMPTOTIC  PROPERTIES  OF  Z(X,m) . 

3.1.  Case  3.  It  is  convenient  to  give  the  asymptotic  results  for  Case  3 
(the  more  difficult  case)  first.  Suppose,  without  loss  of  generality, 
that  the  sample  comes  from  U(0,1).  However,  the  model  is  fitted  without 


9. 


this  knowledge;  thus  the  fitted  model  is 

X,.,  =  a  +  B(m.  -  m)  +  .  (6) 

(x)  1  1 

As  stated  in  Section  2,  this  leads  to  the  test  statistic 

^  ^2  **2 
2(x,m)  =  n{l  -  R‘'{X,m)}  =  "  ^(i)^  /{5^(X^  -  X)  /n } . 

Asymptotically,  the  denominator  tends  to  1/12;  thus  we  must  study 

A 

X,.,  -  X,.,.  This  may  be  written 
(i)  (i) 

^  ^  _ 

X,.,  -  a  -  B(m.  -  m)  =  X,.,  -  X  -  (6  -  1)  (m.  -  m)  -  (m.  -  m) 

(l)  1  (l)  IX 

A 

=  X,.,  -  m.  -  (X  -  m)  -  (6  -  1)  (m.  -  m)  .  (7) 

(x)  1  X 

The  terms  on  the  right  hand  side  of  (7)  can  be  expressed  in 
terms  of  the  quantile  process  ~  ^[nt]  ~  '^[nt]'  ^  ^  where 

[nt]  is  the  greatest  integer  in  nt.  For  t  given  by  i/n  ,  we  have 


X 


(i) 


m.  =  g  (t) ; 
1  n 


/n(X  -  m) 


/n  (B  -  1) 


1 

Q  (s)ds  +  O  (n  ) 
n  p 

0 


n  *’E(X,.,  -  X  -  m.  +  m)  (m. 
_ (jJ _ X _ 1 

_  2 

E(m^  -  m)  /n 


m) 


1  1 

=  12  1  (t  -  ‘5){o  (t)  -  f  O  (s)ds}dt  +0  (n  ^)  , 

J  ’^n  J  n  p 

0  0 
_  2 

recalling  that  m  =  H  and  E(m,  -  m)  /n  1/12  ,  It  is  convenxent 
to  define  the  process  ~  ~  J  Q^(s)ds.  Then  xnsertxon 


of  the  above  expressions  into  (7)  gives 


10. 


As  n  let 

respectively. 


=  Y  (t) 
n 


1 

({n  -  h) 


1 

(  ~h 

12(t  -  ij)  .Y  (t).dt  du  +  O  (n  ). 
n  p 


0  0 

Q(t),  y(t)  be  the  limiting  processes  for  and 

Qtt)  is  the  well-known  Brownian  bridge  with  mean 


(8) 


Y  (t) 
n 


E{Q(t)  }  =  0  and  covariance  pQ(s,t)  =  min(s,t)  -  st.  Y(t)  then  has  mean  0 
and  covariance  PY(s,t)  =  min(s,t)  -  ^s(l  s)  -  ht(l-t)  +  1/12.  The 


process  Y(t)  h<is  already  been  studied  in  connection  with  the  Watson 
2 

statistic  U  (Watson,  1961;  Stephens,  1976).  For  the  asymptotic 
distribution  of  Z(X,m)  we  now  need  the  distribution  of 


Z* 


1 

\’t  U)  dt, 

. 

0 


(9) 


wlieie,  from  (8),  we  have 


W(t)  =  Y(t) 


1  1 

|(u  -  ij)  I  12(t  -  h)  Y(t)  dt  du  . 
0  0 


(10) 


The  covariance  function  of  W(t)  requires  considerable  algebra  but 
the  calculation  is  straightforward;  the  result  may  be  expressed  as 


p  (s,t)  =  p  (s,t)  -  t|;(s)  ■Ai()(t)  (11) 

W  U 

where  \p(s)  '  is  the  transpose  of  iHs)  ,  and  is  the  2-component  vector 
{(s  -  H) ;  s(l-s) (2s  -  1) };  A  is  the  2x2  matrix  with  rows  (-  1) 

and  (1,0).  The  calculation  of  the  distribution  of  Z*  now  follows 
well-known  lines  (see,  for  example,  Durbin  ,  1973,  or  Stephens,  1976)  ; 

Z*  has  the  same  distribution  as  ^  ^i^^i  '  where  i  runs  frpm 


11. 


1  to  “  ,  are  indepenuent  variables,  and  where  are  weights, 

found  by  solving  the  integral  equation 


A  f(s)  p  (s,t)  ds  =  f(t) 
w 


(12) 


for  eigenvalues  A.  and  eigenfunctions  f . (t) 
1  1 


The  solution  of  (12)  is  found  as  follows.  The  covariance  p  (s,t) 

w 

can  be  expressed  as  p  (s,t)  =  min(s,t)  +  g(s,t) ,  with 

w 

g(s.t)  =  fst  -  +  2s"  -  s^  ^  ^t  +  2t2  -  t^  +  ^  -  3st"  +  2st^  -  3s"t  +  2s\. 

Differentiation  of  (12)  twice  with  respect  to  t  then  gives 


-f(t)  +  4 


i  j 

:|f  (s)ds  -  6^ 


f(s)ds  -  6t|f(s)ds  -  6|sf(s)ds  +  12t|sf(s)ds  =  ^  f’  vt) . 


0  0  0 
Differentiation  again  gives 
1  1 

-f  (t)  -  6|  f  (s)ds  +  12|sf  (s)ds  =  j  f  ” ’ (t) 


>j^f(s)ds  +  12|£ 


0 


(13) 


(14) 


and  finally 


-  f”(t)  =  i  f’'"(t)  . 


Thus  f(t)  =  A  cos  /At  +  B  sin  /At  +  Ct  +  D  . 

1 

Suppose  f(s)  is  normalized,  so  that  f  f(s)ds 

J 

1  0 
Set  0  =  /A  ;  then  f(s)ds  =  1  gives 

0 


=  1,  and  let  K  = 


1 

/ 


(15) 


sf (s)ds. 


sin  0  -  (cos  6-l)+^t  ’D=l 
u  o  2 


(16) 


12. 


K  =  sf(s)ds  =  AI^  +  BI^  +  §  +  J 


where 


=  [  s  cos  03  ds  =  9  sin  6  +  cos  9  1 


s  sin  0s  ds  = 


sin  6  ■«"  6  cos  6 

.2 


Substituting  f(t)  into  (13)  gives  -Ct  -  D  +  4  -  6t  -  6K  +  12Kt  = 
for  all  t  ;  thus,  equating  coefficients,  we  have 


-  C  -  6  +  12K  =  0 


-  D  +  4  -  6K  =  0 


Hence  j  j  =  K  ,  and  C  +  2D 


Thus  from  (16)  we  have  A  sin  0  -  B(cos  0-1)  =0,  and  from  (17)  we  have 
AI^  +  BI^  =  0  .  Hence  0  must  satisfy 


sin  9 _ B 

cos  0  -  1  ~  A 


_  ®  _  _  _ 1  _  1-9  sin  9  -  cos  0 

A  I_  sin  0  -  0  cos  0  ' 


So  0  satisfies  2-0  sin  0-2  cos  0=0  ,  by 


cross-multipli cation 


of  (18).  l£t  (J)  =  —  ;  then  2  —  4(|)  sin  <p  cos  <J)  -  2[l  -  2  sin^<J)J  =  0  , 


and  hence 


sin  <j)  =  0  or  sin  (})-(})  cos  (|)  =  0  .  Then  (})^  =  pi.  i  =  1,2,...;  or  alternatively 
\  is  the  solution  of  tan  =  (j)j^  ,  k  =  1 , 2 .  Finally,  A.  =  44)^,  for  the 


first  A-set,  and  A^^  =  for  the  second  A-set. 
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3.2  Case  2.  For  Case  2  the  test  statistic  is 

« 

''2  -  2 

nZ (X  , . ,  -  X , . , )  /E (X , . .  -  X)  =  Z.  .  We  can  take  a  =  0  in  the  model 
(x)  (i)  (i)  2 

E(X,.,)  =  a  +  Bin.  ,  so  that  this  becomes  E(X,..)  =  6m.  ,  with 

(i)  1  (i)  1  ' 


„  Zx,.,m.  ...  Z(3C,.,  -  m.)m. 

o  (l)  1  „  n  vJ-I  X  X 

p  =  - - -  .  Hence  6  -  1  =  — 


Zm^ 

1 


Zm^ 

X 


Similar  reasoning  to  that 


for  Case  3  gives  the  asymptotic  distribution  of  to  be  that  of 


12  )  W2(t)dt  where 


W2(t)  =  Q(t)  -  3t 


sQ(s)ds. 


(19) 


Q(t)  is  as  defined  in  the  previous  section,  and  then  W^Ct)  is  a  Gaussian 
process  with  mean  0  ;  its  covariance  function  (after  some  algebra)  is 

p_(s,t)  =  min(s,t)  -  st  +  .  (20) 

2  D  2  2 


Thus  for  the  weights  in  the  asymptotic  distribution  of  Z^  ,  we  need 
eigenvalues  of  A  f  (s ,t) f (s) ds  =  f(t)  Similar  steps  to  those  for 


Case  3  give  f(t)  =  A  cos  0t  +  B  sin  0t  +  Ct  +  D  with  0  =  /A  ,  as  before. 
Also,  f(0)  so,  so  D  =  -A,  and 


-f(t)  =  3t  sf(s)ds  = 


f  •  •  (t) 


(21) 


Thus  f'’(0)  =  0,  so  D  =  A  =  0.  Then,  from  (21),  we  have 

1  1 

-  B  sin  6t  -  Ct  +  3t[B|  s  sin  0s  ds  +  |  Cs^  ds ]  =  -  B  sin  Ot  . 


0 
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1 

Hence  s  sin  6s  ds  =  Q;  thus  9.  is  the  solution  of 

J  :) 

0 

sin  0  -  0  cos  0.  =0,  that  is,  tan  0.  =  0.,  j  =  1,2,...  .  Finally, 

Dll  33 

2 

A  =  0  .  These  are  the  weights  given  in  Section  2. 

3  j 

3.3  Asymptotic  percentage  points.  The  next  step  is  to  calculate  the 
percentage  points  of,  say,  =  ^v^/A^  where  A^  are  the  weights  for 

Case  3.  The  mean  u  of  Z  is  I  p  (s,s)ds  =  1/15.  The  80  largest  A. 

3  3  Jq  3  1 

★  * 
were  found,  and  Z^  was  approximated  by  S^  =  S  +  T  ,  where 

S*  =  w./\.  and  T  =  y.  -  A.  .  S.  differs  from  Z*  by  E  A. (v.  -  1) 

li'i  3  111  3  81  11 

which  is  a  random  variable  with  mean  0  and  variance 


11 


0^(3, t)ds  dt 


Zf  Af  1; 


00 


this  value  is  found  to  be  negligibly  small.  Thus  critical  points  of  Z*  are 

found  by  finding  those  of  S*  ,  using  Imhof^s  (1961)  method  for  a  finite  sum 
2 

of  weighted  X  variables,  and  then  adding  T  . 


3.4  Tables.  Tables  1  and  2  give  percentage  points  for  Z^  and  Z^ 


respectively.  Those  for  n  finite  have  been  obtained  by  Monte  Carlo 
sampling.  The  last  line  in  each  table  contains  the  asymptotic  points. 
Table  1  also  gives  points  for  a  modification  of  Z^  ,  called  Z^^ 

This  is  the  statistic  (using  the  terminology  for  Case  2  in  Section  2) 


^70 
-  X,.,)  /ZX(., 


This  uses  the  quantity  Ex^^/n  to  eliminate 


the  square  of  the  scale  instead  of  the  sample  variance.  This  is  a  natural 
denominator  in  Case  2  with  a  =  0,  where  the  model  is  E(X^^j)  =  6m^. 


(If  a  is 
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not  zero,  the  new  variable  X'  .  =  X,..  -  a  i  aid  be  used  instead  of  X  ). 

(i)  (i)  U) 

The  asymptotic  points  for  are  0.25  times  those  for  ^2  ’  advantage  in 

using  Z  is  that  the  statistic  is  much  less  variable  for  small  n.  For  Z  ,  Table  2 
2A 

has  already  been  produced  in  Stephens  (1986) ,  although  with  less  accurate 
points;  there  will  be  negligible  difference  in  practical  use. 

3.5  Use  of  the  Tables  with  censored  data.  Suppose  origin  and  scale  are 
both  unknown  (Case  3),  and  the  data  is  censored  at  both  ends.  Thus 

n*  =  r  -  k  +  1  observations  are  available,  consisting  of  all  those  between 

X^j^j  and  R(X,T)  may  be  calculated,  using  the  usual  formula,  but 

with  sums  for  i  from  k  to  r  ,  and  with  T^  =  i/(n+l)  or  T^  =  i  ,  or 

★ 

even  T^jT^,...  equal  to  1,2,. ..,n  ,  these  latter  values  for  T^  being  possibilities 

2 

because  R(X,m)  is  scale  and  location  invariant.  Then  n*  {l  -  R  (X,T)}=  z(X,T) 
will  be  referred  to  Table  3,  using  the  values  for  sample  size  n*  . 

3 . 6  Example .  It  is  well-known  that  if  times  Q^j  i  =  l,2,...,n  represent 

times  of  random  events,  occurring  in  order  with  the  same  rate,  the 

should  be  proportional  to  uniform  order  statistics  •  Thus  the 

may  be  regressed  against  i/(n+l)  or  equivalently  against  i  as 

described  above,  to  test  that  the  events  are  random.  Suppose 

Q(9) ,Q(10) , . • . »Q(2o)  represent  a  subset  of  such  times,  denoting  times  of 

breakdown  of  an  industrial  process.  We  wish  to  test  that  these  are  uniform; 

times  Q,.,  to  Q,o,  have  been  emitted  because  the  process  took  time  to 
(1)  (o) 

stabilize  and  these  are  not  expected  to  have  occurred  at  the  same  rate 
as  the  later  times.  The  times  j f • • • »Q(2q)  120,  135,  137, 

142,  162,  163,  210,  228,  233,  261.  The  value  of  ;  Z(Q,T)  =  12  {l  -R^(Q,T)} 
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=  0.464  •  Reference  to  Table  2  at  line  n  =  12  show  that  there 

♦ 

is  not  significant  evidence,  at  the  10%  level,  to  reject  the  hypothesis 
of  uniformity. 
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4,  THE  CORRELATION  COEFFICIENT:  GENERAL  CASE, 

4.1  The  general  case.  We  now  discuss,  in  a  non-rigorous  fashion,  the 

distribution  of  2(X,m)  for  the  general  test  of  H  given  in  (5).  F  (x) 

0  o 

is  assumed  to  be  a  continuous  distribution,  and  the  sample  can  be  left- 

and  right-censored.  Thus  we  observe  X,,  X,  .  from  a  sample 

(k)  (r)  ^ 

of  size  n  from  the  distribution  F  (x) .  We  can  assume  the  sample  comes 

o 

from  F^(x)  with  a  =  0  and  6=1,  that  is,  from  F(*)  although  (3)  is 
fitted  without  this  knowledge.  Suppose  f (x)  is  the  density  corresponding 
to  F(x).  Then  using  =  f’"^(— i^)  we  have 


Z(X,ra)  =  n{l  -  R  (X,m) }  = 


E^{X.  -  H.  -  a  -  (^  -  1)}' 
k  1  1 _ 

-  Z^(H.  -  ll)^ 
n  k  1 


Define  p  =  (k-l)/n  and  q  =  r/n  and  let  q*  =  F  (q)  and  p*  =  F  (p) 


Also,  let 


q  q 

Y(t)  =  Q(t)  -  [  Q(s)ds  -  [  {1^ - -  -H}Q(s)ds, 


a  j  o 

P 


where 


Q(t)  =  /n  ~  ^  ^(t)},  and  parameters  JJ  and  o  are  given  by 


)J=|F^(s)ds=  f 


xf  (x)  dx  , 


and 
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-1.  >  v2  ,  2  f  2^,  , ^  2 

X  f (s) dx  - . M 


O  =1  (F  (s))ds-iy  = 
P 


The  process  Q(.t)  is  close  to  a  Gaussian  process  with  mean  0  and 


covariance 


p  (s,t)  =  - - 

°  f(F  (s))f(F  (t)) 


The  process  Y(t)  is  then  close  to  a  Gaussian  process  with  mean  0 


and  covariance 


'|^(u)p  (s,u)du 
o 


p(s,t)  =  p^(s,t)  -  !|;(s)  j>j^(u)  p^(u,t)du  ~  i(;(,t) 

P  P 

q  q  q  q 

-  I  p^(u,t)du  -  j  p^(s,u)du  j  j  p^(u,v)dudv 
P  P  P  P 


q  q 


q  q 


+  'f'(s)'^(t)  J  j  'f'<v)dudv  +  (ij^(s)  +  ij^(t)) 

P  P 


p  (u.v)  U^(u)  dudv 
o 


P  P 


where  ijj(s)  = 


F  ^(s)  -  M 


2  2  2 
The  denominator  of  Z  =  n{l  -  R  },  where  we  write  Z  for  Z(X,m)  and  R  Cor  R  ( 


2  2 
then  close  to  O  ,  and  the  numerator  is  close  to  T  =  Y  (t)dt.  Thus  the 


X,m)  ,  i. 


asymptotic  theory  now  depends  on  the  behaviour  of  T  .  It  appears  generally 
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that  this  behaviour  is  determined  by  that  of  Q  (t)dt.  There  are  3  cases 


in  practice,  which  we  label  Cases  A,  B  and  C.  Define 


H  5 


Pq  (s  ,  t)  dsdt 


=  j  p^(t,t)dt  . 


Case  A.  In  this  case  suppose  <  °°  and  -  Then  we  have 


2  1  \ 

Z  =  na-R)=»  -y  I  vVA^ 

a  ^ 


where  v^  are  independent  variables  and  are  eigenvalues  of 


f(s)  =X  p(s ,  t)  f  (t)  dt.  (The  sum  ZX^  will  be  <  '»)  . 


Case  B.  Suppose  J.  <  but  J.  =  °°  .  Then  there  exists  a  “  such 
-  ^^12  n 

2  1  ^  -v 

that  Z-a  =n(l-R)-a  =>  —  Z,  X.  (v.  "  1)  ,  where  the  X.  and 

n  n  ^2  1  1  i  '  1 

-1 

V.  are  as  defined  above.  (In  this  case  ZX.  =  °°. ) 

1  1 


Case  C.  For  this  case  suppose  both  integrals  and  are  infinite. 

Then  there  exist  constants  a  ,  b  ,  such  that 

n '  n 


Z-a  n(l-R)-a 
n  _  n 


N(0,1) . 
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4 . 3  Examples . 

1.  The  exponential  distribution. 

For  q  =  1  we  have  case  C;  ^  “  (log  n)  ,  so  that 


n  (1 


-  R  )  —  log  n 
2 /(log  n) 


N(0,1)  . 


For  q  <  1  we  are  in  Case  A  and  the  distribution  is  a  sum  of  weighted  chi- 
squared  variables. 

2 .  The  uniform  test  (discussed  above) . 

For  any  p  or  q  Case  A  applies  and  (r  -  k+1) (1  -  R^)  has  the  same 
limiting  distribution  regardless  of  p,  q  . 

3 .  The  normal  test. 

For  p  =  0  or  q  =  1  or  both  we  get  Case  D  . 

For  p  >  0,  q  <  .1  we  get  Case  A. 

4.  The  Logistic  test;  F(w)  =  1/(1  +  e'^),  -  oo<aj<<»  . 

For  p  =  0  or  q  =  1  or  both  we  get  Case  C. 

For  p  >  0  and  q  <  1,  we  get  Case  A.  The  logistic  test  is  thus 

similar  to  the  exponential  test. 

w 

>■  ^ 

5  .  Test  for  the  Extreme  Value  distribution  ,  I ;  p  (w)  =  l-  e 

For  p  =  0,  we  get  Case  C. 

For  p  >  0  we  get  Case  A. 
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6.  Test  for  the  Extreme  Value  distribution  II;  F(w)=e  ,  —  <»<tj<oo 

« 

For  q  =  1,  we  get  Case  C. 

For  q  <  1 ,  we  get  Case  A. 

Discussion.  The  discussion  above  is  somewhat  imprecise.  When  p  is 

0  or  q  is  1  there  are  technical  details  which  have  been  glossed  over. 

For  the  distributions  we  have  studied  however  the  criteria  given  in  Cases 
A,  B  and  C  lead  to  the  correct  answer  for  asymptotic  distributions  of 
2 

Z(X,m)  =  n{l  -  R  (X,m)}. 


Table  1.  Critical  Points  for  Z2  and  Z 
Upper  tail  significance  level  (percent) 


50 

25* 

0.690 

1.240 

0.763 

1.323 

0.806 

1.364 

0.832 

1.388 

0.848 

1.407 

0.877 

1.438 

0.881 

1.444 

0.907 

1.470 

0.916 

1.480 

0.920 

1.485 

0.922 

1.488 

0.932 

1.497 

10 

5 

2.5 

3.47 

8.67 

20.3 

2.59 

4.74 

8.49 

2.37 

3.78 

6.29 

2.34 

3.40 

5.30 

2.  33 

3.27 

4.80 

2.32 

3.12 

4.26 

2.32 

3.10 

4.18 

2.32 

3.03 

3.82 

2.32 

3.00 

3.73 

2.32 

2.99 

3.71 

2.32 

2.98 

3.70 

2.31 

2.98 

3.67 

4 

0.140 

0.245 

0.333 

0.411 

0.545 

0.707 

1.010 

6 

0.166 

0.287 

0.379 

0.467 

0.616 

0.796 

1.065 

8 

0.184 

0,307 

0.40  3 

0.494 

0.648 

0.830 

1.089 

10 

0.193 

0.320 

0.420 

0.512 

0.670 

0.848 

1. 102 

12 

0.200 

0.  330 

0.432 

0.52  3 

0.683 

0.861 

1.111 

18 

0.209 

0.346 

0.452 

0.543 

0,705 

0.882 

1.121 

20 

0.212 

0.349 

0.455 

0.547 

0.708 

0.886 

1.124 

40 

0.224 

0.362 

0.472 

0.563 

0.727 

0.903 

1,138 

60 

0.228 

0.367 

0.477 

0.568 

0.734 

0.909 

1.146 

80 

0.229 

0.369 

0.479 

0.570 

0.736 

0.911 

1.149 

100 

0.230 

0.370 

0.480 

0.572 

0.737 

0.912 

1.150 

GO 

0.233 

0.374 

0.485 

0.578 

0.744 

0.917 

1.155 
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Table  2.  Cril-ical  Points  for  . 


n 

0.5 

0.25 

0.15 

0.10 

0.05 

0  .025 

0.01 

4 

0.344 

0.559 

0.734 

0.888 

1.089 

1.238 

1.388 

6 

0.441 

0.703 

0.901 

1.053 

1.325 

1.590 

1.918 

8 

0.495 

0.792 

1.000 

1.16  3 

1.474 

1.739 

2.100 

10 

0.535 

0.833 

1.068 

1.245 

1.532 

1.846 

2.294 

0.560  0.864  1.093  1.280  1.608  1.918  2.360 
18  0.605  0.940  1.147  1.348  1.672  2.008  2.503 
20  0.610  0.960  1.200  1.370  1.680  2.025  2.520 
40  0.640  0.980  1.215  1.396  1.732  2.076  2.580 
60  0.648  0.988  1.227  1.410  1.750  2.092  2.590 
80  0.658  0.997  1.228  1.418  1.760  2.104  2.610 
“  0.666  0.992  1.234  1.430  1.774  2.129  2.612 
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